Node Types
Introduction to nodes
A node is a building block in a Towhee pipeline. Each node has its own function or transformations to process data. By combining each node, you can create directed acyclic graphs (DAGs) and build pipelines. If you are a Towhee beginner, it is highly recommended that you learn the section Create Your First Pipeline first before going through the programming guide.
Node types
Currently, Towhee supports nine types of nodes. Input and output nodes are used for pipeline input and output definition. The rest seven types of nodes, each with unique data transformations, are usually used for data processing and analytics. The following table lists the nine types of nodes in Towhee and their corresponding interface.
Node Type | Description |
---|---|
input(input_schema) | This node defines the input schema of a pipeline and is the beginning of a pipeline's definition. Note that a pipeline's input schema can not be empty. Refer to input API for more details. |
output(output_schema) | This node defines the pipeline's output schema, and ends a pipeline definition. Once called, a pipeline instance will be created and returned. Refer to output API for more details. |
map(input_schema, output_schema, func) | This node applies the given function func to each of its inputs and returns the transformed data. map returns one row for every row of input. Refer to map API for more details. |
flat_map(input_schema, output_schema, func) | This node flattens the results after applying the function to every row of input, and returns the flattened data respectively.The returned data can have the same count or more number of rows compared with the input. This is one of the major differences between flat_map and map , where map always returns the same number of rows as input. Refer to flat_map API for more details. |
filter(input_schema, output_schema, filter_columns, func) | This node applies the filter function func to the filter_columns .Refer to filter API for more details. |
window(input_schema, output_schema, size, step, func) | This node batches the input rows into multiple rows based on the specified window size and step . Then it applies a function func to each of the windowed data, and returns the results - one row of results for each of the windows. Refer to window API for more details. |
time_window(input_schema, output_schema, timestamp_col, size, step, func) | This node is used to batch rows that have a time sequence, for example, audio or video frames.time_window is similar to window , but the batching rule is applied based on a timestamp column (timestamp_col ). size is the time interval of each window, and step determines how long a window moves from the previous one. Note that if step is less than size , the windows will overlap. Refer to time_window API for more details. |
window_all(input_schema, output_schema, func) | This node batches all input rows into one window, and returns the result by applying a function func to the window. Refer to window_all API for more details. |
concat(pipelines) | This node concats multiple pipelines' intermediate results, and groups all the pipelines into a bigger one. Refer to concat API for more details. |